[SPARK-9973][SQL]correct buffer size#8189
Closed
viper-kun wants to merge 1 commit intoapache:masterfrom
viper-kun:errorSize
Closed
[SPARK-9973][SQL]correct buffer size#8189viper-kun wants to merge 1 commit intoapache:masterfrom viper-kun:errorSize
viper-kun wants to merge 1 commit intoapache:masterfrom
viper-kun:errorSize
Conversation
Contributor
Author
|
@liancheng @scwf is it OK? |
Contributor
|
Could you file a JIRA ticket and update the PR title to |
Contributor
|
Jenkins, this is ok to test. |
|
Test build #40973 has finished for PR 8189 at commit
|
Contributor
|
Thanks, I'm merging this to master. @rxin Is it OK to have this one in branch-1.5 at this time? |
Contributor
|
@viper-kun Please add your name and email address to GitHub so that we can include that information while merging your PRs. I've added your information gathered from JIRA by hand this time. |
Contributor
|
This is fine for 1.5. |
asfgit
pushed a commit
that referenced
this pull request
Aug 16, 2015
The `initialSize` argument of `ColumnBuilder.initialize()` should be the number of rows rather than bytes. However `InMemoryColumnarTableScan` passes in a byte size, which makes Spark SQL allocate more memory than necessary when building in-memory columnar buffers. Author: Kun Xu <viper_kun@163.com> Closes #8189 from viper-kun/errorSize. (cherry picked from commit 182f9b7) Signed-off-by: Cheng Lian <lian@databricks.com>
CodingCat
pushed a commit
to CodingCat/spark
that referenced
this pull request
Aug 17, 2015
The `initialSize` argument of `ColumnBuilder.initialize()` should be the number of rows rather than bytes. However `InMemoryColumnarTableScan` passes in a byte size, which makes Spark SQL allocate more memory than necessary when building in-memory columnar buffers. Author: Kun Xu <viper_kun@163.com> Closes apache#8189 from viper-kun/errorSize.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When cache table in memory in spark sql, we allocate too more memory.
InMemoryColumnarTableScan.class
val initialBufferSize = columnType.defaultSize * batchSize
ColumnBuilder(attribute.dataType, initialBufferSize, attribute.name, useCompression)
BasicColumnBuilder.class
buffer = ByteBuffer.allocate(4 + size * columnType.defaultSize)
So total allocate size is (4+ size * columnType.defaultSize * columnType.defaultSize), We should change it to 4+ size * columnType.defaultSize.